Universal scaling of distances in complex networks 
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Universal scaling of distances between vertices of Erdos-Renyi random graphs, scale-free Barabasi- 
Albert models, science collaboration networks, biological networks, Internet Autonomous Systems 
and public transport networks are observed. A mean distance between two nodes of degrees ki and 
kj equals to {Uj) = A — B\og{kikj). The scaling is valid over several decades. A simple theory for 
the appearance of this scaling is presented. Parameters A and B depend on the mean value of a 
node degree (fc)„„ calculated for the nearest neighbors and on network clustering coefficients. 



PACS numbers: 



I.75.HC, 02.50.- 



1.75. Da 



Recently, much effort has been put into investigation 
of network systems, in order to recognize their struc- 
tures and emerging complex properties (for a review see 
011 nil)- The empirical analysis of many real com- 
plex networks has revealed the presence of several uni- 
versal scaling laws. The scale-free behavior of degree 
distributions P{k) ^ k~'' [sj observed in a number of 
social, biological and technological systems is probably 
the most amazing. Aside from that, many further scal- 
ing laws have been found, such as a dependence of clus- 
tering coefficient on node degree in hierarchical networks 
c(k)j~^ [^, scaling of connection weight distribution 
LZli 1^ , connection load distribution load dependence 
on degree 10] and others 1 L .12. IJi] . 

At the macro-scale one can describe a whole network 
by a dependence of a mean distance between any pair of 
nodes on the network size and in many real networks the 
small- world effect is observed 0| , i.e. the mean distance 
I between nodes of such networks increases not faster 
than logarithmically with their size N. In scale-free net- 
works the small- world effect chan ges t o ultra small- world 
effect {I ~ log log A^) when A < 3 13 III 113 El • 

It was 

also observed that if a network disorder is present, opti- 
mal paths become much longer and the small- world effect 
disappears |l9l | . The recent research on complex networks 
is slowly shifting from problems of network topology to 
directed and weighted networks network dy- 

namics [i^, as well as to the issue of network efficiency 

In the present paper we come back to networks geome- 
try and analyze surprising empirical scaling that has not 
been considered before. We think that our observations 
can be important for understanding of network struc- 
tures and for processes driving their evolution as well as 
for constructing search algorithms in real web-like sys- 
tems. We show that the mean distance between nodes 
with degrees ki and kj is given by the following relation 



(kj) = A- Blogihk,). 



(1) 



The above scaling law is shown to be correct not only for 
network models but also for many real networks regard- 



less of their degree distribution and correlation profiles. 

Fig. n presents mean distance {kj) between pairs of 
nodes i and j as a function of a product of their degrees 
kikj in selected complex networks. Analyzed systems be- 
long to very different classes ranging from generic mod- 
els of random graphs and scale- free networks, through 
natural systems such as food webs and metabolic net- 
works to man-made like the Internet and public trans- 
port networks. We include data for Erdos-Renyi ran- 
dom graphs, Barabasi-Albert evolving networks, biolog- 
ical networks Ep, IHE^I {Silwood, Ythan, Yeas i), social 
networks US (co-authorship groups Astro and Cond- 
mat), Internet Autonomous Systems [sOl and selected 
networks for public transport in Polish cities [s^ 
(Gorzow Wlkp., Lodz, Zielona Gora). One can see, that 
the relation is very well fulfilled over several decades 
for all our data. Let us stress that the networks men- 
tioned above display a wide variety of basic character- 
istics. Among them there are both scale-free and single 
scale networks, with either neg ligible or very high cluster- 
ing coefficient, assortative |2J], disassortative or imcorre- 
lated. The only apparent common feature of all above 
systems is the small- world effect. We have checked how- 
ever that for the small- world Watts-Strogatz model 0| , 
the scaling |^ is nearly absent and it is visible only for 
large rewiring probability, and only for larger degrees, 
where nodes have many shortcuts. 

Although the scaling works well for distances aver- 
aged over all pairs of nodes specified by a given product 
kikj there can be large differences if one changes ki while 
keeping kikj constant. The Fig. |21 presents the depen- 
dence of average path length on ki, for a fixed product 
kikj in the case of Astro network and for the Internet Au- 
tonomous Systems in 1999. One can see that although 
the Astro network is assortative jiil (short-range attrac- 
tion), pairs of nodes with similar degrees are in aver- 
age further away than different degree pairs (lo ng-r ange 
repulsion). For the disassortative network AS [3J| the 
behavior is opposite. For uncorrelated networks (Erdos- 
Renyi, Barabasi-Albert), the average path length is con- 
stant if the product kikj is fixed |3.1| . 
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FIG. 1: Mean distance (hj) between pairs of nodes i and j as a function of a product of their degrees kikj. (a) Erdos- 
Renyt random graphs: (k) — 20 and — 10000 (squares) = 50000 (circles), (h) Barabdsi-Albert networks: (k) = 8 and 
= 1000 (squares) A'' = 10000 (circles), (c) Biological networks: Silwood (squares), Yeast (triangles), Ythan (diamonds), (d) 
Co-authorship networks: Astro (squares), Cond-mat (circles), (e) Internet Autonomous Systems: Year 1997 (squares). Year 
1998 (circles) Year 1999 (triangles), Year 2001 (diamonds), (f) Public transport networks in Polish cities: Gorzow Wlkp. 
(squares), Lodz (triangles), Zielona Gora (circles) In (b), (d) and (e) data are logarithmically binned with the power of 2, in 
case of (c) and (a) with the power of 1.25 and in case of (f) data are not binned. 



To justify the relation let us consider a simple ap- 
proach that bases on a concept of branching trees explor- 
ing the space of a random network. We need to estimate 
the mean shortest path between a node i of degree ki 
and a node j of degree kj. Let us notice that following 
a random direction of a randomly chosen edge one ap- 
proaches node j with a probability pj = kj/{2E), where 
2E — N{k) is a double number of links. It follows that in 
average one needs Mj = l/pj = 2E/kj of random trials 
to arrive at the node j. 

Now let us consider a branching process represented 
by the tree Ti (Fig. O that starts at the node i where an 
average branching factor is k (all loops are neglected). If 
a distance between the node i and the surface of the tree 
equals to x then in average there are Ni — kiK^^^ nodes 
at such a surface and there is the same number of links 
ending at these nodes. It follows that in average the tree 
Ti touches the node j when iV^ = Mj i.e. when 

fc^/fc^K^-i = N{k). (2) 
Since the mean distance from the node i to the node j is 



{lij) = X thus we get the scaling relation Q with 

A.l-i-M^ and B^' (3) 

log K log K 

The result Q is in agreement with the paper JssI where 
the concept of generating functions for random graphs 
has been used. 

One has to take into account that in the above consid- 
erations we have assumed there are no degree-degree cor- 
relations, we have neglected all loops and we have treated 
the branching level x as a continuum variable to fulfill the 
relation The last approximation can be improved if 
one finds a probability distribution for P{hj) and a cor- 
responding average value (hj). Such an approach has 
been performed in our papers [tR ll^ where we have ap- 
plied the concept of hidden variables and have received 
the same value of the parameter B and small corrections 
to A. 

The mean branching factor k is a mean value over all 
local branching factors and over all trees in the network. 
In the first approximation it could be estimated as the 
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FIG. 2: Dependence of average path length on ki, for fixed 
kikj product. The hnes connecting the symbols are there for 
clarity. The bars show point weight, meaning relative num- 
bers of pairs ij. The horizontal lines are weighted averages 
over ki, and are just average path lengths for given kikj. The 
top shows data for the Internet Autonomous Systems, while 
bottom for Astro co-authorship network. Note: The very 
small shifts on ki axis between data for different kikj are ar- 
tificially introduced to make the weight bars not overlap. 



mean arithmetic value of a nearest neighbor degree less 
one (incoming edge): k = (fc}„„ — 1. Such a mean value 
is however not exact because local branching factors in 
every tree are multiplied one by another in (01 • The 
corrected mean value of k should be taken as an arith- 
metic mean value over all geometric values from different 
trees, what is very difficult to perform numerically. We 
calculate arithmetic mean branching factor over nearest 
neighborhood of every node m, i.e. k^™-' — (fc)im'' — 1, 
and then average it geometrically over all nodes m, i.e. 
K = Although our approach is not exact, it 

leads to a good agreement between coefficients A^, Be 
taken from real networks (see Table Pl and A, B calcu- 
lated from our model. 

The influence of loops of the length three on the re- 
lation 1^ can be estimated as follows. Let us assume 
that in the branching process forming the tree Ti two 
nodes from the nearest neighborhood of the node i are 
directly linked (the dashed line at Fig|3Il. In fact such 
a situation can occur at any point of the branching tree 



FIG. 3: Tree formed by a random process, starting from the 
node i and approaching the node j. 



Ti . Since such links are useless for further network explo- 
ration by the tree Tj thus an effective contribution from 
both connected nodes to the mean branching factor of 
the tree Ti is decreased. Assuming that clustering coef- 
ficients of every node are the same, the corrected factor 
for the branching process equals to where c 

is the network clustering coefficient. This equation is not 
valid for the branching process around the node i where 
n'i ~ K — c{ki — 1). A similar situation arises around the 
node j. Replacing ki and kj with (k) in and Kj one 
gets 



hkj[K{l-c')f[K{l-c)Y 



N{k), 



(4) 



where c' = c((/c) 
have 



1)/k. It follows that instead of © we 



A' = 3+ 



log(jV(fc)) -21og[^(l-c0] 

log[K(l - C)] 



B' = 



log[«;(I - c)] ■ 
(5) 

The Table m contains a comparison between the exper- 
imental data (Fig. ^ and theoretical predictions given 
by Eq. JSJ and One can observe that the range 

of parameters A and B for different networks is very 
broad. Our approximate approach fits very well to ran- 
dom Erdos-Renyi graphs and BA models and is fairly 
good for co-authorship and biological networks as well as 
for the Internet Autonomous System and public trans- 
port network in Zielona Gora while for two other trans- 
port systems it leads to larger errors. Corrections due 
to clustering effects give a better fit for the coefficient 
A', while for some networks the coefficient B is closer 
to experimental value Be than B' . The good agreement 
between theory based on random networks and empirical 
data suggests that the considered real networks exhibit 
a large level of randomness. 

In conclusions we have observed universal path length 
scaling for different classes of real networks and models. 
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network 


iV 


(k) 


c 




A 


A A/ A 


A' 


A A' /A' 


Be 


B 


AB/B 


B' 


AB'/B' 


Erdos-Renyi 


10000 


20.00 


0.002 


5.48 


5.08 


-0.08 


5.08 


0.08 


0.798 


0.769 


-0.04 


0.770 


-0.04 


Erdos-Renyi 


50000 


20.00 


0.000 


5.86 


5.61 


-0.04 


5.61 


-0.04 


0.729 


0.769 


0.05 


0.769 


0.05 


Barabasi- Albert 


1000 


8.00 


0.038 


4.54 


4.24 


-0.07 


4.27 


-0.06 


0.813 


0.830 


0.02 


0.842 


0.03 


Barabasi- Albert 


10000 


8.00 


0.007 


5.17 


4.81 


-0.08 


4.81 


-0.07 


0.778 


0.777 


0.00 


0.779 


0.00 


Astro 


13986 


25.56 


0.609 


5.24 


4.30 


-0.22 


4.98 


-0.05 


0.707 


0.595 


-0.19 


0.786 


0.10 


Cond-mat 


17013 


9.46 


0.604 


5.90 


5.09 


-0.16 


6.38 


0.08 


0.908 


0.786 


-0.16 


1.150 


0.21 


Silwood 


153 


4.77 


0.142 


4.22 


3.69 


-0.14 


3.78 


-0.12 


0.955 


0.941 


-0.02 


1.004 


0.05 


Yeast 


1846 


2.39 


0.068 


7.53 


6.66 


-0.13 


6.87 


-0.10 


1.406 


1.552 


0.09 


1.629 


0.14 


Ythan 


135 


8.83 


0.216 


3.39 


3.35 


-0.01 


3.45 


0.02 


0.649 


0.765 


0.15 


0.832 


0.22 


AS 1997 


3015 


3.42 


0.182 


3.99 


3.39 


-0.18 


3.42 


-0.17 


0.562 


0.596 


0.06 


0.629 


0.11 


AS 1998 


4180 


3.72 


0.250 


4.08 


3.41 


-0.20 


3.45 


-0.18 


0.555 


0.575 


0.04 


0.620 


0.10 


AS 1999 


5861 


3.86 


0.250 


4.03 


3.35 


-0.20 


3.38 


-0.19 


0.532 


0.540 


0.01 


0.579 


0.08 


AS 2001 


10515 


4.08 


0.289 


3.96 


3.23 


-0.23 


3.25 


-0.22 


0.471 


0.481 


0.02 


0.518 


0.09 


Gorzow Wlkp. 


269 


2.48 


0.082 


24.36 


16.06 


-0.52 


19.76 


-0.23 


12.27 


5.333 


-1.30 


6.651 


-0.84 


Lodz 


1023 


2.83 


0.065 


24.01 


11.67 


-1.06 


12.70 


-0.89 


8.621 


3.084 


-1.80 


3.389 


-1.54 


Zielona Gora 


312 


2.98 


0.067 


10.03 


8.96 


-0.12 


9.63 


-0.04 


3.908 


2.682 


-0.46 


2.917 


-0.34 



TABLE 1: Comparison between experimental and theoretical data. Astro and Cond-mat are co-authorship networks, Silwood, 
Yeast and Ythan axe biological networks and AS stands for the Internet Autonomous Systems with number meaning the year 
data were gathered, Gorzow Wlkp., Lodz and Zielona Gora are public transport networks in corresponding Polish cities. A'' is 
the number of nodes, (k) - mean degree value, c - clustering coefficient. Ae and Be mean experimental values (Fig. whereas 
A and B are given by A' and B' by ©. AA, AA' , AB, AB' stand for following differences A - Ae, A' - Ae, B - Be, 
B' - Be. 



The mean distance between nodes of degrees fc^ and kj 
is a linear function of \og{kikj). The scaling holds over 
many decades regardless of network degree distributions, 
correlations and clustering. A simple model of random 
tree exploring the network explains such a behavior and 
leads to a good agreement with experimental data. We 
expect that the observed scaling law is universal for many 
complex networks, with applicability reaching far beyond 
the quoted examples. 
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